Main Questions
Our main question:
- Question 1
- Question 2 etc.
Disclaimer: The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given data set, and should not be used in the context of making policy decisions without external consultation from scientific experts.
This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) United States License.
To cite this case study please use:
Wright, Carrie and Ontiveros, Michael and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). https://github.com/opencasestudies/ocs-bp-co2-emissions. Exploring CO2 emissions across time (Version v1.0.0).
To access the GitHub repository for this case study see here: https://github.com//opencasestudies/ocs-bp-co2-emissions. This case study is part of a series of public health case studies for the Bloomberg American Health Initiative.
Content content
Content for quotes
for large images from the web… might do this instead:
To underline something:
Bold
Italics
underline and bold
underline and bold and italics
List:
1)makesure there are two spaces
2)after each item to create new line
Yanosky, J. D. et al. Spatio-temporal modeling of particulate air pollution in the conterminous United States using geographic and meteorological predictors. Environ Health 13, 63 (2014).
Our main question:
In this case study, we will explore CO2 emission data from around the world. We will also focus on the US specifically to evaluate patterns of temperatures and natural disaster activity.
This case study will particularly focus on how to use different datasets that span different ranges of time, as well as how to create visualizations of patterns over time. We will especially focus on using packages and functions from the tidyverse, such as dplyr, tidyr, and ggplot2.
The tidyverse is a library of packages created by RStudio. While some students may be familiar with previous R programming packages, these packages make data science in R especially legible and intuitive.
The skills, methods, and concepts that students will be familiar with by the end of this case study are:
Data Science Learning Objectives:
dplyr for data wranglingdplyrggplot2ggplot2 plotsggplot2 plotsStatistical Learning Objectives:
We will begin by loading the packages that we will need:
library(here)
library(readr)
library(dplyr)| Package | Use |
|---|---|
| here | to easily load and save data |
| readr | to import the CSV file data |
The first time we use a function, we will use the :: to indicate which package we are using. Unless we have overlapping function names, this is not necessary, but we will include it here to be informative about where the functions we will use come from.
There are some important considerations regarding this data analysis to keep in mind:
Limitation 1
Limitaiton 2
If you want to make a table about variable info:
| Variable | Details |
|---|---|
| variable1 | Variable info – more details – more detials Example: Content content |
| variable2 | Variable info – more details – more detials Example: Content content |
Put files in docs directory and use here package.
pm <-readr::read_csv(here("docs", "pm25_data.csv"))We will also use the %>% pipe which can be used to define the input for later sequential steps. This will make more sense when we have multiple sequential steps using the same data object. To use the pipe notation we need to install and load dplyr as well.
Can add DT tables too- note that you can`t use these inside a click expand details section.
library(DT)
DT::datatable(iris)Scrollable content:
# Scroll through the output!
pm %>%
distinct(state) %>%
print(n = 1e3)# A tibble: 49 x 1
state
<chr>
1 Alabama
2 Arizona
3 Arkansas
4 California
5 Colorado
6 Connecticut
7 Delaware
8 District Of Columbia
9 Florida
10 Georgia
11 Idaho
12 Illinois
13 Indiana
14 Iowa
15 Kansas
16 Kentucky
17 Louisiana
18 Maine
19 Maryland
20 Massachusetts
21 Michigan
22 Minnesota
23 Mississippi
24 Missouri
25 Montana
26 Nebraska
27 Nevada
28 New Hampshire
29 New Jersey
30 New Mexico
31 New York
32 North Carolina
33 North Dakota
34 Ohio
35 Oklahoma
36 Oregon
37 Pennsylvania
38 Rhode Island
39 South Carolina
40 South Dakota
41 Tennessee
42 Texas
43 Utah
44 Vermont
45 Virginia
46 Washington
47 West Virginia
48 Wisconsin
49 Wyoming
To make click expand section use:
text text
Note!!! You cannot use scroll features inside detail sections unless it is the last header section! Otherwise it will cause the other headers to be missing and other issues.
You can still do this if you leave an open details section like this and then have a section header at the same level as this section:
text text
Scrollable content:
# Scroll through the output!
pm %>%
distinct(state) %>%
print(n = 1e3)# A tibble: 49 x 1
state
<chr>
1 Alabama
2 Arizona
3 Arkansas
4 California
5 Colorado
6 Connecticut
7 Delaware
8 District Of Columbia
9 Florida
10 Georgia
11 Idaho
12 Illinois
13 Indiana
14 Iowa
15 Kansas
16 Kentucky
17 Louisiana
18 Maine
19 Maryland
20 Massachusetts
21 Michigan
22 Minnesota
23 Mississippi
24 Missouri
25 Montana
26 Nebraska
27 Nevada
28 New Hampshire
29 New Jersey
30 New Mexico
31 New York
32 North Carolina
33 North Dakota
34 Ohio
35 Oklahoma
36 Oregon
37 Pennsylvania
38 Rhode Island
39 South Carolina
40 South Dakota
41 Tennessee
42 Texas
43 Utah
44 Vermont
45 Virginia
46 Washington
47 West Virginia
48 Wisconsin
49 Wyoming
review of tidymodels
guide for preprocessing with recipes
guide for using GGally to create correlation plots guide for using parsnip to try different algorithms or engines recipe functions
Terms and concepts covered:
Tidyverse
RStudio cheatsheets
Inference
Regression
Different types of regression
Ordinary least squares method
Residual
Packages used in this case study:
| Package | Use |
|---|---|
| here | to easily load and save data |
| readr | to import the CSV file data |
| dplyr | to arrange/filter/select/compare specific subsets of the data |
| skimr | to get an overview of data |
| summarytools | to get an overview of data in a different style |
| pdftools | to read a PDF into R |
| magrittr | to use the %<>% pipping operator |
| purrr | to perform functions on all columns of a tibble |
| tibble | to create data objects that we can manipulate with dplyr/stringr/tidyr/purrr |
| tidyr | to separate data within a column into multiple columns |
| ggplot2 | to make visualizations with multiple layers |
library(devtools)
session_info()─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.0.3 (2020-10-10)
os macOS Mojave 10.14.6
system x86_64, darwin17.0
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2021-07-23
─ Packages ───────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0)
bslib 0.2.5.1 2021-05-18 [1] CRAN (R 4.0.2)
cachem 1.0.5 2021-05-15 [1] CRAN (R 4.0.2)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.0.2)
cli 3.0.0 2021-06-30 [1] CRAN (R 4.0.2)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3)
crosstalk 1.1.1 2021-01-12 [1] CRAN (R 4.0.2)
DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.2)
desc 1.3.0 2021-03-05 [1] CRAN (R 4.0.2)
devtools * 2.4.2 2021-06-07 [1] CRAN (R 4.0.2)
digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
dplyr * 1.0.7 2021-06-18 [1] CRAN (R 4.0.2)
DT * 0.18 2021-04-14 [1] CRAN (R 4.0.2)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.2)
evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0)
fansi 0.5.0 2021-05-25 [1] CRAN (R 4.0.2)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.2)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.2)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
here * 1.0.1 2020-12-13 [1] CRAN (R 4.0.2)
highr 0.9 2021-04-16 [1] CRAN (R 4.0.2)
hms 1.1.0 2021-05-17 [1] CRAN (R 4.0.2)
htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.2)
htmlwidgets 1.5.3 2020-12-10 [1] CRAN (R 4.0.2)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.0.2)
jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.2)
knitr * 1.33.6 2021-06-17 [1] Github (yihui/knitr@11eeb43)
lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.2)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.0.2)
pillar 1.6.1 2021-05-16 [1] CRAN (R 4.0.2)
pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.2)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0)
pkgload 1.2.1 2021-04-06 [1] CRAN (R 4.0.2)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0)
processx 3.5.2 2021-04-30 [1] CRAN (R 4.0.2)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.2)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.0)
R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
readr * 1.4.0 2020-10-05 [1] CRAN (R 4.0.2)
remotes 2.4.0 2021-06-02 [1] CRAN (R 4.0.2)
rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.2)
rmarkdown 2.9.1 2021-06-17 [1] Github (rstudio/rmarkdown@bafed5e)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.2)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2)
sass 0.4.0 2021-05-12 [1] CRAN (R 4.0.2)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
stringi 1.6.2 2021-05-17 [1] CRAN (R 4.0.2)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.0)
testthat 3.0.4 2021-07-01 [1] CRAN (R 4.0.2)
tibble 3.1.2 2021-05-16 [1] CRAN (R 4.0.2)
tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.0.2)
usethis * 2.0.1 2021-02-10 [1] CRAN (R 4.0.2)
utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.2)
vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.2)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.2)
xfun 0.24 2021-06-15 [1] CRAN (R 4.0.2)
yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0)
[1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library